Introduction to statistics using R
Copy and paste the code on this webpage to RStudio Cloud, editing the values to match your data.
0. Getting started
Optional: Use R Studio online (free)
Sign up on R Studio Cloud
You can make an account on RStudio Cloud to use R Studio without downloading software. The service is free with a limited number of hours per month.
Create an R file to save work
Steps after opening R Studio
- Sign into RStudio Cloud.
- Click
New Project>New RStudio Project. - Click
File>New File>R Script. - Click the save icon and name your file.
Install and load libraries/packages
Add the code below to the top of the R file and click Source. The initial download may take a few minutes. If a package is already installed, the code will load the packages.
Essentially, it should no longer be necessary to use library() or package:: in the remaining code.
Finally, the packages in quotes below are all the libraries references in this guide. Replace them as needed, using either single (') or double (") quotes.
packages = c('DT', 'dplyr', 'magrittr', 'kableExtra', 'ggplot2', 'plotly', 'stargazer')
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)Data used in examples
The data used for the examples is pre-installed in RStudio. The data set mtcars refers to different car models, which is shown below.
Please note: the data set is edited so that car names are included as the first column instead of row names. The row names are unchanged but not displayed.
# Add row names as an extra column (12th)
df <- mtcars
df$car <- rownames(df)
# Reorder columns st. 12th is first
df <- df[,c(12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)]
# Display the data set without row names
DT::datatable(df,
caption = 'Pre-installed data set \'mtcars\'',
rownames = FALSE)The edits made to the data in the previous code block are necessary.
1. Write data
After pasting the code into an R file, replace the comma-separated numbers with your data.
No variables/1 column
List of numbers
The data are provided as a list of numbers without any variables, or a table with only one column. In other words, only one ‘thing’ is being measured.
df <- c(1, 2, 3, 4, 5)2 or more variables/columns
The letters in quotes should be replaced with the name of the column using either single or double quotes.
Table with 2 columns
df <- data.frame(
'x' = c(1, 2, 3, 4, 5),
'y' = c(6, 7, 8, 9, 10)
)Table with 3 columns
df <- data.frame(
'x' = c(1, 2, 3, 4, 5),
'y' = c(6, 7, 8, 9, 10),
'z' = c(11, 12, 13, 14, 15)
)Table with 4 or more columns
Follow the pattern in the previous examples. Each column must have the same number of rows.
For example, if column x is
'x' = c(1, 2, 3) and column y is
'y' = c(11, 12, 13),
both columns have 3 rows. If column z is 'z' = c(21, 22), this column could not be included with columns x and y.)
Rename the columns of the data
Replace the letters x, y, or z with the names, surrounded by quotation marks. In general, do not include spaces or special characters in the name, except dashes (-), underscores (_), and numbers(0-9). Note that these names are case-sensitive (name and Name are different). It is good practice to always use lower-case letters first.
# DO NOT RUN
df <- data.frame(
'good_name' = c(),
'good-name1' = c(),
'goodName' = c(),
'bad name' = c(),
'Bad.Name' = c(),
'Really Bad Name' = c()
)2. Calculate statistics
The output will appear in the Console panel in RStudio.
Measures of central tendency
Median
List df
median(df)Table df with column x
median(df$x)= 3
Mode
See custom functions.
Measures of dispersion
Variance
List df
var(df)Table df with column x
var(df$x)= 2.5
Standard deviation
List df
sd(df)Table df with column x
sd(df$x)= 1.581139
Standard error
See custom functions.
Custom functions
Copy and paste the code below into an R file, then click Source. To use the function, call it using the second code block.
Mode
# Sort by frequency, descending order (child: mode())
sort_mode <- function(x) {
temp <- data.frame(table(x))
temp <- temp[order(-temp$Freq),]
rownames(temp) <- NULL
names(temp) <- c('value', 'frequency')
return(temp)
}
# Find the mode (parent: sort_mode())
mode <- function(x) {
x <- sort_mode(x)
rows_x <- nrow(x)
max_freq <- max(x$frequency)
x$is_max <- 0
x$is_max[x$frequency==max_freq] <- 1
x <- x[x$is_max==1,]
x <- data.frame(as.numeric(as.character(x$value)))
rownames(x) <- NULL
if(nrow(x)==rows_x){
x = 'no mode'
} else {
names(x) <- c('mode(s):')
}
return(x)
}List df
mode(df)Table df for column x
mode(df$x)= no mode
Standard error
se <- function(x) {
temp <- round(sd(x)/sqrt(length(x)), digits=4)
return(temp)
}List df
se(df)Table df for column x
se(df$x)= 0.7071
Manually calculate r, r 2, a (y-intercept), and b (slope)
# Create table to manually calculate several statistics (r, r^2, a, and b)
make_table <- function(df) {
require(DT)
df$xy <- df$x*df$y
df$x2 <- df$x**2
df$y2 <- df$y**2
all_sums <- c(sum(df$x), sum(df$y), sum(df$xy), sum(df$x2), sum(df$y2))
df <- rbind(df, all_sums)
rownames(df)[rownames(df)==as.character(nrow(df))] <- 'Total Sums'
table <- DT::datatable(df,
extensions = 'Buttons',
caption = paste('Total observations: n = ', nrow(df)-1),
options = list(
dom = 'Bt',
buttons = c('copy', 'csv', 'excel')
))
return(table)
}Table df for columns x and y
make_table(df)Manually calculate continuous probabilities
The data set uses decimals in the second column, which represent probabilities.
df <- data.frame(
'x' = c(1, 2, 3, 4, 5),
'y' = c(0.1, 0.2, 0.3, 0.4, 0.5)
)# Create table to manually calculate continuous probabilities
probability_table <- function(df) {
df$mean <- df$x*df$y
mean <- sum(df$mean)
df <- subset(df, select=-c(mean))
df$`x-m` <- df$x - mean
df$`(x-m)^2` <- round(df$`x-m`**2, digits=3)
df$`(x-m)^2*p(x)` <- round(df$`(x-m)^2`*df$y, digits=3)
names(df)[names(df)=='y'] <- 'p(x)'
var <- round(sum(df$`(x-m)^2*p(x)`), digits=3)
sd <- round(sqrt(sum(df$`(x-m)^2*p(x)`)), digits=3)
var_row <- c('', '', '', '', var)
df <- rbind(df, var_row)
sd_row <- c('', '', '', '', sd)
df <- rbind(df, sd_row)
rownames(df)[rownames(df)==as.character(nrow(df)-1)] <- 'Var.'
rownames(df)[rownames(df)==as.character(nrow(df))] <- 'Std. Dev.'
table <- DT::datatable(df,
extensions = 'Buttons',
caption = paste('See bottom rows for Variance (Var.)
and Standard Deviation (Std. Dev.)'),
options = list(
dom = 'Bt',
buttons = c('copy', 'csv', 'excel')
))
return(table)
}Table df for columns x and y
probability_table(df)3a. Make basic tables
For data with 2 or more columns only. All examples use the pre-installed data set ‘mtcars’
Replace Title in each example with your own title, surrounded by quotation marks.
Standard table
Basic HTML table.
library(magrittr)
df%>%
kableExtra::kbl(caption = 'Title')%>%
kableExtra::kable_styling()| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
Other themes
Use a scroll bar to limit the size of a table.
Replace 100 in 100% and 200 in 200px to change the size of the table.
kableExtra::kbl(cbind(df, df)) %>%
kableExtra::kable_paper() %>%
kableExtra::scroll_box(width = "100%", height = "200px")| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
Highlight a specific row when hovering with a cursor.
library(magrittr)
df %>%
kableExtra::kbl(caption = 'Title') %>%
kableExtra::kable_paper("hover")| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
Highlight but do not use the width of the page.
library(magrittr)
df %>%
kableExtra::kbl(caption = 'Title') %>%
kableExtra::kable_paper("hover", full_width = F)| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
| Merc 280C | 17.8 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.90 | 1 | 0 | 4 | 4 |
| Merc 450SE | 16.4 | 8 | 275.8 | 180 | 3.07 | 4.070 | 17.40 | 0 | 0 | 3 | 3 |
| Merc 450SL | 17.3 | 8 | 275.8 | 180 | 3.07 | 3.730 | 17.60 | 0 | 0 | 3 | 3 |
| Merc 450SLC | 15.2 | 8 | 275.8 | 180 | 3.07 | 3.780 | 18.00 | 0 | 0 | 3 | 3 |
| Cadillac Fleetwood | 10.4 | 8 | 472.0 | 205 | 2.93 | 5.250 | 17.98 | 0 | 0 | 3 | 4 |
| Lincoln Continental | 10.4 | 8 | 460.0 | 215 | 3.00 | 5.424 | 17.82 | 0 | 0 | 3 | 4 |
| Chrysler Imperial | 14.7 | 8 | 440.0 | 230 | 3.23 | 5.345 | 17.42 | 0 | 0 | 3 | 4 |
| Fiat 128 | 32.4 | 4 | 78.7 | 66 | 4.08 | 2.200 | 19.47 | 1 | 1 | 4 | 1 |
| Honda Civic | 30.4 | 4 | 75.7 | 52 | 4.93 | 1.615 | 18.52 | 1 | 1 | 4 | 2 |
| Toyota Corolla | 33.9 | 4 | 71.1 | 65 | 4.22 | 1.835 | 19.90 | 1 | 1 | 4 | 1 |
| Toyota Corona | 21.5 | 4 | 120.1 | 97 | 3.70 | 2.465 | 20.01 | 1 | 0 | 3 | 1 |
| Dodge Challenger | 15.5 | 8 | 318.0 | 150 | 2.76 | 3.520 | 16.87 | 0 | 0 | 3 | 2 |
| AMC Javelin | 15.2 | 8 | 304.0 | 150 | 3.15 | 3.435 | 17.30 | 0 | 0 | 3 | 2 |
| Camaro Z28 | 13.3 | 8 | 350.0 | 245 | 3.73 | 3.840 | 15.41 | 0 | 0 | 3 | 4 |
| Pontiac Firebird | 19.2 | 8 | 400.0 | 175 | 3.08 | 3.845 | 17.05 | 0 | 0 | 3 | 2 |
| Fiat X1-9 | 27.3 | 4 | 79.0 | 66 | 4.08 | 1.935 | 18.90 | 1 | 1 | 4 | 1 |
| Porsche 914-2 | 26.0 | 4 | 120.3 | 91 | 4.43 | 2.140 | 16.70 | 0 | 1 | 5 | 2 |
| Lotus Europa | 30.4 | 4 | 95.1 | 113 | 3.77 | 1.513 | 16.90 | 1 | 1 | 5 | 2 |
| Ford Pantera L | 15.8 | 8 | 351.0 | 264 | 4.22 | 3.170 | 14.50 | 0 | 1 | 5 | 4 |
| Ferrari Dino | 19.7 | 6 | 145.0 | 175 | 3.62 | 2.770 | 15.50 | 0 | 1 | 5 | 6 |
| Maserati Bora | 15.0 | 8 | 301.0 | 335 | 3.54 | 3.570 | 14.60 | 0 | 1 | 5 | 8 |
| Volvo 142E | 21.4 | 4 | 121.0 | 109 | 4.11 | 2.780 | 18.60 | 1 | 1 | 4 | 2 |
Additional information can be found here.
3b. Make interactive tables
For data with 2 or more columns only. Lists will not work.
Standard table
Replace Title with a title for the table.
DT::datatable(df,
rownames = FALSE,
caption = paste('Title')
)With column filters
DT::datatable(df,
rownames = FALSE,
extensions = 'Buttons',
caption = paste('Title'),
options = list(
dom = 'Bt'),
filter = 'top')Other themes
Replace Title with a title for the table.
DT::datatable(df,
rownames = FALSE,
caption = paste('Title'),
class = 'cell-border stripe'
)DT::datatable(df,
rownames = FALSE,
extensions = 'Buttons',
caption = paste('Title'),
class = 'cell-border stripe',
options = list(
dom = 'Bt',
buttons = c('copy', 'csv', 'excel')
))Additional information can be found here.
4. Statistical figures
The following section references columns within data sets. To reference a specific column for a data set df, we use a dollar sign $ afterwards and write the name of the column:
# DO NOT RUN
df$name_of_columnThe column names of the data set used for this example can be found below.
## car mpg cyl disp hp drat wt qsec vs am gear carb
Scatter plot
Replace mpg and wt with the column names of quantitative data. Note that each column name is preceded with a tilde ~ without a space. Replace Title with a title for the figure.
library(magrittr)
plotly::plot_ly(
data = df,
x = ~wt,
y = ~mpg)%>%
plotly::layout(title='Title')Scatter plot with a legend
Replace mpg and wt with the column names of quantitative data. Note that each column name is preceded with a tilde ~ without a space. Replace Title with a title for the figure.
Continuous variable
Replace cyl with a column that will determine the color each point receives. If the data is numerical, it will create a gradient by default.
library(magrittr)
plotly::plot_ly(
data = df,
x = ~wt,
y = ~mpg,
color = ~cyl)%>%
plotly::layout(title='Title')Discrete variable
Although the column cyl is numerical, there are only three unique values that appear. By appending factor, the column is treated as qualitative data, where a unique number is a group.
Replace cyl with qualitative data, including numerical data with discrete values only.
library(magrittr)
plotly::plot_ly(
data = df,
x = ~wt,
y = ~mpg,
color = ~factor(cyl))%>%
plotly::layout(title='Title')Bar chart
library(magrittr)
plotly::plot_ly(
data = df,
x = ~car,
y = ~mpg,
type = 'bar')%>%
plotly::layout(title='Title')Pareto chart
library(magrittr)
plotly::plot_ly(
data = df,
x = ~reorder(car, -mpg),
y = ~mpg,
type = 'bar')%>%
plotly::layout(title='Title')%>%
plotly::layout(xaxis = list(title='car'))Box-and-whisker plot
plotly::plot_ly(data = df, y=~mpg, type = 'box', hoverinfo = 'y', name = '')%>%
plotly::layout(title = 'Title')Histogram
plotly::plot_ly(data=df, x=~mpg, type='histogram')plotly::plot_ly(data=df, x=~mpg, type='histogram', histnorm='probability')5. Regression models
Using the data mtcars, create an ordinary least-squares (OLS) regression model where the miles per gallon (mpg) of each car is a function of its weight (wt).
Create an OLS regression model
\[miles\ per\ gallon = \alpha + \beta(weight) + \epsilon\]
\[mpg = \alpha + \beta \cdot wt + \epsilon\]
lm(mpg~wt, data=df)##
## Call:
## lm(formula = mpg ~ wt, data = df)
##
## Coefficients:
## (Intercept) wt
## 37.285 -5.344
Summarize a model
summary(lm(mpg~wt, data=df))##
## Call:
## lm(formula = mpg ~ wt, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
stargazer::stargazer(lm(mpg~wt, data=df), type='html')| Dependent variable: | |
| mpg | |
| wt | -5.344*** |
| (0.559) | |
| Constant | 37.285*** |
| (1.878) | |
| Observations | 32 |
| R2 | 0.753 |
| Adjusted R2 | 0.745 |
| Residual Std. Error | 3.046 (df = 30) |
| F Statistic | 91.375*** (df = 1; 30) |
| Note: | p<0.1; p<0.05; p<0.01 |
From the results, the following models are created:
OLS model:
\[mpg = 37.29 - 5.35wt + \epsilon\]
Fitted model:
\[\hat{\text{MPG}} = 37.29 - 5.35 \cdot \hat{\text{WEIGHT}}\]
Using the fitted model, we predict the miles per gallon (\(\hat{\text{MPG}}\)) of a car using its weight (\(\hat{\text{WEIGHT}}\)) by plugging-in the weight of the car and solving the equation. In other words, multiply the weight by 5.35 and subtract the product from 37.29.
Draw the line of best fit
Basic scatter plot from the data set.
plot(x=df$wt, y=df$mpg, data=df)Draw the regression line over the scatter plot.
plot(x=df$wt, y=df$mpg, data=df)
abline(lm(mpg~wt, data=df))Change the color of the regression line and change the labels of the scatter plot.
plot(x=df$wt, y=df$mpg, data=df,
main='Title',
xlab='x-axis',
ylab='y-axis')
abline(lm(mpg~wt, data=df), col='red')Change the points of the scatter plot (25 options under pch; option 19 is filled-in circles).
# options 1 - 25 for pch
plot(x=df$wt, y=df$mpg, data=df,
main='Title',
xlab='x-axis',
ylab='y-axis',
pch=19)
abline(lm(mpg~wt, data=df), col='red')
Group the observations
plot(x=df$wt, y=df$mpg, col=factor(df$cyl), data=df,
main='Title',
xlab='x-axis',
ylab='y-axis',
pch=19)
legend("topright",
title='Legend title',
legend=levels(factor(df$cyl)),
pch=19,
col=unique(factor(df$cyl)))
abline(lm(mpg~wt, data=df), col='red')